SI649-24 Fall -> Altair I¶
Overview¶
We're going to re-create some of the visualizations we did in Tableau but this time using Altair for the article: “The Dollar-And-Cents Case Against Hollywood’s Exclusion of Women”. We'll be teaching you different pieces of Altair over the next few weeks so we'll focus on just a few visualizations this time:
- Replicate 1 visualizations in the original article
- Implementing 2 new visualizations according to our specifications
For this lab, we have done all of the necessary data transformation for you. You do not need to modify any DataFrame. You only need to write Altair code.
Lab Instructions (read the full version on the handout of the previous lab)¶
- Save, rename, and submit the ipynb file (use your username in the name).
- Run every cell (do Runtime -> Restart and run all to make sure you have a clean working version), print to pdf, submit the pdf file.
- For each visualization, we will ask you to write down a "Grammar of Graphics" plan first (basically a description of what you'll code).
- If you end up stuck, show us your work by including links (URLs) that you have searched for. You'll get partial credit for showing your work in progress.
- There are many bonus point opportunities in this lab.
We encourage you to go through the Altair tutorials before next week:
Resources¶
# imports we will use
import altair as alt
import pandas as pd
from collections import defaultdict
alt.renderers.enable('html')#run this line if you are running jupyter notebook
RendererRegistry.enable('html')
# load data and perform basic data processing
# get the CSV
datasetURL="https://raw.githubusercontent.com/dallascard/SI649_public/master/altair_hw1/movies_individual_task.csv"
movieDF=pd.read_csv(datasetURL, encoding="latin-1")
# fix the result column, rename the values, and combine "dubious" with "ok" as "Passes Bechdel Test"
movieDF['test_result'] = movieDF['clean_test'].map({
"ok":"Passes Bechdel Test",
"men":'Women only talk about men',
"notalk":"Women don't talk to each other",
"nowomen":"Fewer than two women",
"dubious":"Passes Bechdel Test"
})
# fix the location column to combine US and Canada
locationDict = defaultdict(lambda: 'International')
locationDict["United States"]="U.S. and Canada"
locationDict["Canada"]="U.S. and Canada"
movieDF["country_binary"]=movieDF["country"].map(locationDict)
##calculate ROI (Return on Investment) both domestic (US and Canada) and international
movieDF["roi_dom"]=movieDF["domgross_2013$"]/movieDF["budget_2013$"]
movieDF["int_only_gross"]=movieDF["intgross_2013$"]-movieDF["domgross_2013$"]
movieDF["roi_int"]=movieDF["int_only_gross"]/movieDF["budget_2013$"]
# drop the columns we won't need
movieDF=movieDF.drop(columns=["Unnamed: 0","test","budget","domgross","intgross","code","period code","decade code","director","imdb"])
# Make a copy of the data frame that excludes movies from before 1990
movieDF_since_1990=movieDF[movieDF.year>1989]
#take a look at the new dataset
movieDF_since_1990.sample(3)
| year | title | clean_test | binary | budget_2013$ | domgross_2013$ | intgross_2013$ | director_gender | genre | rating | country | language | test_result | country_binary | roi_dom | int_only_gross | roi_int | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1367 | 1998 | Rush Hour | ok | PASS | 50019631 | 201774709.0 | 350566156.0 | male | Action | 7.0 | United States | English | Passes Bechdel Test | U.S. and Canada | 4.033910 | 148791447.0 | 2.974661 |
| 1096 | 2002 | Gerry | nowomen | FAIL | 9066255 | 329860.0 | 329860.0 | male | Adventure | 6.2 | United States | English | Fewer than two women | U.S. and Canada | 0.036383 | 0.0 | 0.000000 |
| 988 | 2004 | The Butterfly Effect | men | FAIL | 16031507 | 71432302.0 | 118444284.0 | male | Sci-Fi | 7.7 | United States | English | Women only talk about men | U.S. and Canada | 4.455745 | 47011982.0 | 2.932474 |
Part 1: Recreate this visualization¶

Step 1: Write down your plan for each part of this chart:¶
For each chart, we are asking you to write a Grammar of Graphics plan for the chart. This involves writing down 1) the dataset you will use; 2) the type of mark you will use (e.g., bar, line, point, etc.), and 3) for each visual channel (e.g., position, color, etc.), the corresponding variable name (e.g., year, ROI, etc.) and data type (i.e., ordinal, nominal, or quantitative). Please use the following format:
- Data Name: dataset
- Mark type: mark type
- Encoding Specification:
- channel:variable:datatype
- channel:variable:datatype
- ...
Hint: you should provide encoding specifications for both x and y, using the format channel:variable:datatype For example, if we wanted to encode a nominal variable called "movietype" as the color, we would write:
- color : movietype : nominal
*** Edit this cell to be your visualization plan (required) ***:¶
Left Chart:
- Data Name: movieDF_since_1990
- mark type: bar
- Encoding Specification:
- length:dollars earned for every dollar spent:quantitative
- horizontal position: movie categories:nominal
- color: Fixed blue to represent U.S. and Canada
Right Chart:
- Data Name: movieDF_since_1990
- mark type: bar
- Encoding Specification:
- length:dollars earned for every dollar spent:quantitative
- horizontal position: movie categories:nominal
- color: Fixed orange to represent international
Compound Method (how to join these charts together?): Use a grouped bar chart approach, where both U.S./Canada and International charts are placed side-by-side for easy comparison
Step 2: Create your chart.¶
Please use the checkpoints below to work through the problem step-by-step. You can search for the keyword "TODO" to locate cells that need your edits
Visualization 1 Checkpoints¶
checkpoint 1: create the left chart as a basic bar chart (Domestic ROI by Bechdel test category)¶
- Specify the correct mark
- Use the correct x and y encoding
- Plotting the right data (hint: make sure you examine the data frame and use the correct columns)
You chart will look like:

left_chart = alt.Chart(movieDF_since_1990).mark_bar().encode(
x=alt.X('median(roi_dom):Q', title='Median of roi_dom'),
y=alt.Y('test_result:N', sort=['Fewer than two women','Passes Bechdel Test', "Women don't talk to each other", 'Women only talk about men'], title='test_result')
)
left_chart
checkpoint 2: sort the categories on the y-axis¶
- completed checkpoint1
- applied the correct sort order to the values on the y-axis (i.e., from top to bottom, the order of the bars is "Passes Bechdel Test", "Women only talk about men", "Women don't talk to each other", "Fewer than two women")
You chart will look like:

Hint: Sort
left_chart = alt.Chart(movieDF_since_1990).mark_bar().encode(
x=alt.X('median(roi_dom):Q', title='Median of roi_dom'),
y=alt.Y('test_result:N', sort=['Passes Bechdel Test', 'Women only talk about men', "Women don't talk to each other", 'Fewer than two women'], title='test_result')
)
left_chart
left_chart = alt.Chart(movieDF_since_1990).mark_bar().encode(
x=alt.X('median(roi_dom):Q', title=None, axis=None),
y=alt.Y('test_result:N', sort=['Passes Bechdel Test', 'Women only talk about men', "Women don't talk to each other", 'Fewer than two women'], title=None)
).properties(
title='U.S. and Canada'
)
left_chart
checkpoint 4: Reshape the plot¶
- completed checkpoint 3
- Reshape the plot to have both width and height equal to 100
You chart will look like:

Hint: set the width and height properties of the chart
left_chart = alt.Chart(movieDF_since_1990).mark_bar().encode(
x=alt.X('median(roi_dom):Q', title=None, axis=None),
y=alt.Y('test_result:N', sort=['Passes Bechdel Test', 'Women only talk about men', "Women don't talk to each other", 'Fewer than two women'], title=None)
).properties(
title='U.S. and Canada',
width=100,
height=100
)
left_chart
checkpoint 5: Add a text layer with the numbers for each bar¶
- completed checkpoint 4
- add the numbers for each bar with correct formatting (two decimal places)
You chart will look like:

Hint 1: In Altair you can overlay two charts on top of each other using the "+" notation (e.g., chart1 + chart 2)
Hint 2: You can create a text layer that inherits everything from the base layer by using base.mark_text().encode(text="..."), where base is the name of the base bar chart, and the "..." is the data to show as text
Hint 3: Use the "dx" property of mark_text() to nudge the text left or right (see https://altair-viz.github.io/gallery/bar_chart_with_labels.html)
base = alt.Chart(movieDF_since_1990).mark_bar().encode(
x=alt.X('median(roi_dom):Q', title=None, axis=None),
y=alt.Y('test_result:N', sort=['Passes Bechdel Test', 'Women only talk about men', "Women don't talk to each other", 'Fewer than two women'], title=None)
).properties(
title='U.S. and Canada',
width=100,
height=100
)
text = base.mark_text(
align='left',
baseline='middle',
dx=3
).encode(
text=alt.Text('median(roi_dom):Q', format='.2f')
)
left_chart = base + text
left_chart
checkpoint 6: remove the x-axis line and chart box, and increase the padding between bars¶
- completed checkpoint 5
- remove the x-axis line
- remove the box around the figure
- increase the padding between bars
You chart will look like:

Hint 1: You can make use of configure_axis() and configure_view() of the overall view (base and text layer combined) Hint 2: There are multiple ways to increase the spacing between bars
left_chart = (base + text).configure_axis(
grid=False
).configure_view(
strokeOpacity=0
).configure_bar(
size=20
)
left_chart
checkpoint 7: create the right chart using International ROI with the same stylings as the left chart¶
- completed checkpoint 6
- create the right chart with the same stylings as the left chart
- correct data
- correct mark
- correct encoding
- apply correct sort order
- no x-and y-axis labels
- no x-axis
- no box on chart
- text lables with proper formatting and alignment
- include the title for International ROI
You chart will look like:

right_chart_base = alt.Chart(movieDF_since_1990).mark_bar().encode(
x=alt.X('median(roi_int):Q', title=None, axis=None),
y=alt.Y('test_result:N', sort=['Passes Bechdel Test', 'Women only talk about men', "Women don't talk to each other", 'Fewer than two women'], title=None)
).properties(
title='International',
width=100,
height=100
)
right_chart_text = right_chart_base.mark_text(
align='left',
baseline='middle',
dx=3
).encode(
text=alt.Text('median(roi_int):Q', format='.2f')
)
right_chart = (right_chart_base + right_chart_text).configure_axis(
grid=False
).configure_view(
strokeOpacity=0
).configure_bar(
size=20
)
right_chart
checkpoint 8: remove y-axis labels and change color¶
- completed checkpoint 7
- remove y-axis labels
- set the bar color
You chart will look like:

right_chart = (right_chart_base + right_chart_text).configure_axis(
grid=False,
labels=False
).configure_view(
strokeOpacity=0
).configure_bar(
size=20
).configure_mark(
color='orange'
)
right_chart
checkpoint 9: combine the two charts together¶
- display both completed charts side by side
You chart will look like:

Hint: You will need to move your view and axis configurations to the overall combined chart!
# Base chart for U.S. and Canada
base = alt.Chart(movieDF_since_1990).mark_bar(color='steelblue').encode(
x=alt.X('median(roi_dom):Q', title=None, axis=None),
y=alt.Y('test_result:N', sort=[
'Passes Bechdel Test',
'Women only talk about men',
"Women don't talk to each other",
'Fewer than two women'
], title=None)
).properties(
title='U.S. and Canada',
width=100,
height=100
)
# Add text to U.S. and Canada chart
text = base.mark_text(
align='left',
baseline='middle',
dx=3
).encode(
text=alt.Text('median(roi_dom):Q', format='.2f')
)
left_chart = base + text
# Base chart for International
right_chart_base = alt.Chart(movieDF_since_1990).mark_bar(color='orange').encode(
x=alt.X('median(roi_int):Q', title=None, axis=None),
y=alt.Y('test_result:N', sort=[
'Passes Bechdel Test',
'Women only talk about men',
"Women don't talk to each other",
'Fewer than two women'
], axis = None, title=None)
).properties(
title='International',
width=100,
height=100
)
# Add text to International chart
right_chart_text = right_chart_base.mark_text(
align='left',
baseline='middle',
dx=3
).encode(
text=alt.Text('median(roi_int):Q', format='.2f')
)
right_chart = right_chart_base + right_chart_text
# Concatenate the two charts
combined_chart = alt.hconcat(
left_chart,
right_chart
).configure_axis(
grid=False,
domain=False # This removes the axis lines
).configure_view(
strokeOpacity=0
).configure_bar(
size=20
)
# Display the combined chart
combined_chart
BONUS: add an overall title and add dollar symbols to text marks¶
- complete checkpoint 9
- add an overall title
- add dollar signs to text labels
You chart will look like:

# Base chart for U.S. and Canada
base = alt.Chart(movieDF_since_1990).mark_bar(color='steelblue').encode(
x=alt.X('median(roi_dom):Q', title=None, axis=None),
y=alt.Y('test_result:N', sort=[
'Passes Bechdel Test',
'Women only talk about men',
"Women don't talk to each other",
'Fewer than two women'
], title=None)
).properties(
title='U.S. and Canada',
width=100,
height=100
)
# Add text to U.S. and Canada chart
text = base.mark_text(
align='left',
baseline='middle',
dx=3
).encode(
text=alt.Text('median(roi_dom):Q', format='$,.2f')
)
left_chart = base + text
# Base chart for International
right_chart_base = alt.Chart(movieDF_since_1990).mark_bar(color='orange').encode(
x=alt.X('median(roi_int):Q', title=None, axis=None),
y=alt.Y('test_result:N', sort=[
'Passes Bechdel Test',
'Women only talk about men',
"Women don't talk to each other",
'Fewer than two women'
], axis=None, title=None)
).properties(
title='International',
width=100,
height=100
)
# Add text to International chart
right_chart_text = right_chart_base.mark_text(
align='left',
baseline='middle',
dx=3
).encode(
text=alt.Text('median(roi_int):Q', format='$,.2f')
)
right_chart = right_chart_base + right_chart_text
# Concatenate the two charts
combined_chart = alt.hconcat(
left_chart,
right_chart
).configure_axis(
grid=False,
domain=False # This removes the axis lines
).configure_view(
strokeOpacity=0
).configure_bar(
size=20
).properties(
title='Dollars Earned for Each Dollar Spent (2013 Dollars)'
)
# Display the combined chart
combined_chart
Visualization 2: Replicate this visualization¶

*** Step 1: Write down your plan for the visualization (required) ***¶
- Data Name: movieDF
- mark type: line chart
- Encoding Specification (1st chart):
- x:year:ordinal
- y:mean budget:quantitative
- Encoding Specification (2nd chart):
- x:year:ordinal
- y:median budget:quantitative
- Encoding Specification (3rd chart):
- x:year:ordinal
- y:max budget:quantitative
Compound Method (how to join these charts together?): Use a vertical facet layout to stack the charts on top of each other. Each line chart (mean, median, and max) represents a separate aspect of movie budgets over time, but they share the same x-axis (year) for easier comparison across the charts.
Step 2: Create your chart.¶
Please use the checkpoints below to work through the problem step-by-step. You can search for the keyword "TODO" to locate cells that need your edits
Visualization 2 Checkpoints¶
checkpoint 1: line chart for average, median, and max of budget¶
You will get full points if you
- Specify the correct mark
- Use the correct x and y encoding
- Plotting the right data
- Produce 3 line charts concatenated vertically
You chart will look like:

# Line chart for average budget
average_budget = alt.Chart(movieDF).mark_line().encode(
x='year:O',
y='mean(budget_2013$):Q'
).properties(
width=500,
height=200
)
# Line chart for median budget
median_budget = alt.Chart(movieDF).mark_line().encode(
x='year:O',
y='median(budget_2013$):Q'
).properties(
width=500,
height=200
)
# Line chart for max budget
max_budget = alt.Chart(movieDF).mark_line().encode(
x='year:O',
y='max(budget_2013$):Q'
).properties(
width=500,
height=200
)
# Concatenate the charts vertically
budget_charts = alt.vconcat(
average_budget,
median_budget,
max_budget
)
budget_charts
checkpoint 2: adjust width, height and color¶
Each chart should be 500x100, plotted with different colors
- Complete checkpoint 1
- Adjust chart width and height
- Plot charts with different colors
You chart will look like:

# Line chart for average budget
average_budget = alt.Chart(movieDF).mark_line().encode(
x='year:O',
y='mean(budget_2013$):Q'
).properties(
width=500,
height=100
)
# Line chart for median budget
median_budget = alt.Chart(movieDF).mark_line(color='grey').encode(
x='year:O',
y='median(budget_2013$):Q'
).properties(
width=500,
height=100
)
# Line chart for max budget
max_budget = alt.Chart(movieDF).mark_line(color='pink').encode(
x='year:O',
y='max(budget_2013$):Q'
).properties(
width=500,
height=100
)
# Concatenate the charts vertically
budget_charts = alt.vconcat(
average_budget,
median_budget,
max_budget
)
budget_charts
checkpoint 3: remove duplicated x-axis and adjust tick spacing¶
You will get full points if you
- Complete checkpoint 2
- Remove duplicate x-axes from top and middle figures
- Set the bottom x-axes to have ticks every 5 years
You chart will look like:

# Line chart for average budget
average_budget = alt.Chart(movieDF).mark_line().encode(
x=alt.X('year:Q', axis=alt.Axis(labels=False, ticks=False)),
y='mean(budget_2013$):Q'
).properties(
width=500,
height=100
)
# Line chart for median budget
median_budget = alt.Chart(movieDF).mark_line(color='grey').encode(
x=alt.X('year:Q', axis=alt.Axis(labels=False, ticks=False)),
y='median(budget_2013$):Q'
).properties(
width=500,
height=100
)
# Line chart for max budget
max_budget = alt.Chart(movieDF).mark_line(color='pink').encode(
x=alt.X('year:Q', axis=alt.Axis(tickCount=9)),
y='max(budget_2013$):Q'
).properties(
width=500,
height=100
)
# Concatenate the charts vertically
budget_charts = alt.vconcat(
average_budget,
median_budget,
max_budget
)
budget_charts
Visualization 3: Replicate this visualization¶

*** Step 1: Write down your plan for the visualization, for all four channels (required) ***¶
- Data Name: movieDF
- mark type: scatter plot
- Encoding Specification:
- x:rating:quantitative
- y:budget_2013$:quantitative
- color:country_binary:nominal
- size:roi_int:quantitative
Step 2: Create your chart.¶
Please use the checkpoints below to work through the problem step-by-step. You can search for the keyword "TODO" to locate cells that need your edits
checkpoint 1: scatter plot of IMDB rating vs budget (in 2013 dollars)¶
You will get full points if you
- Specify the correct mark
- Plotting the right data
- Use the correct x and y encoding
You chart will look like:

scatter_plot = alt.Chart(movieDF).mark_circle().encode(
x=alt.X('rating:Q', title='IMDB Rating'),
y=alt.Y('budget_2013$:Q', title='Budget (2013 Dollars)')
)
scatter_plot
checkpoint 2: add the color and size channels¶
- Complete checkpoint 1
- Add the color channel
- Add the size channel
- Plot the right data
You chart will look like:

scatter_plot = alt.Chart(movieDF).mark_circle().encode(
x=alt.X('rating:Q'),
y=alt.Y('budget_2013$:Q'),
color=alt.Color('country_binary:N'),
size=alt.Size('roi_int:Q')
)
scatter_plot
checkpoint 3: adjust the Legend for the size channel¶
- Complete checkpoint 2
- set the legend for the Size channel to explicitly include 0
You chart will look like:

Hint: You will need to adjust the legend using alt.Legend() within alt.Size()
scatter_plot = alt.Chart(movieDF).mark_circle().encode(
x=alt.X('rating:Q'),
y=alt.Y('budget_2013$:Q'),
color=alt.Color('country_binary:N'),
size=alt.Size('roi_int:Q',
legend=alt.Legend(title="ROI", values=[0, 50, 100, 200]))
)
scatter_plot
checkpoint 4: adjust the Scale for the Size channel¶
- Complete checkpoint 3
- set the scale of the size channel to map the data onto the range 10 to 300 (this maps 0 in the data to a size of 10)
You chart will look like:

Hint: You will need to adjust the scale of the size channel using alt.Scale()
scatter_plot = alt.Chart(movieDF).mark_circle().encode(
x=alt.X('rating:Q'),
y=alt.Y('budget_2013$:Q'),
color=alt.Color('country_binary:N'),
size=alt.Size('roi_int:Q',
legend=alt.Legend(title="ROI", values=[0, 50, 100, 200]),
scale=alt.Scale(range=[10, 300]))
)
scatter_plot
End of Lab
To submit your assignment:
- Please run all cells (Runtime > Run all), and make sure all the cells ran properly!!
- Make sure you have named your .ipynb file with your uniqname: i.e., uniqname.ipynb
- Upload your .ipynb file to Canvas.
